Overview
Phase 2 enriches the core dataset by fetching regulatory filings, news, corporate actions, surveillance lists, and market-specific data. All scripts in this phase depend onmaster_isin_map.json from Phase 1.
Phase 2 includes Phase 2.5 (OHLCV Data) which can be toggled via
FETCH_OHLCV = True/False in run_full_pipeline.py.Execution Order
Phase 2 runs 13 scripts in parallel where possible:Script 1: fetch_company_filings.py
Purpose
Fetches regulatory filings from two endpoints and merges results for maximum coverage.API Endpoints
Request Payload
Output Files
| File Pattern | Description | Count |
|---|---|---|
company_filings/{SYMBOL}_filings.json | Merged filings per stock | 2,775 files |
Deduplication Logic
Filings are deduplicated bynews_id + news_date + caption to avoid duplicates from both endpoints.
Threading
- Workers: 20 concurrent threads
- Typical Time: ~3-5 minutes
Script 2: fetch_new_announcements.py
Purpose
Fetches live corporate announcements (non-regulatory) from Dhan.API Endpoint
Request Payload
Output Files
| File | Description | Size |
|---|---|---|
all_company_announcements.json | All announcements across stocks | ~8 MB |
Threading
- Workers: 40 concurrent threads
- Typical Time: ~2-3 minutes
Script 3: fetch_advanced_indicators.py
Purpose
Fetches technical indicators: Pivot Points, EMA/SMA signals, MACD, RSI sentiment.API Endpoint
Request Payload
Output Files
| File | Description | Size |
|---|---|---|
advanced_indicator_data.json | All technical indicators | ~8.3 MB |
Data Fetched
- Pivot Point (daily)
- SMA Status (20, 50, 200)
- EMA Status (20, 200)
- RSI Sentiment
- MACD Sentiment
Threading
- Workers: 50 concurrent threads
- Typical Time: ~2 minutes
Script 4: fetch_market_news.py
Purpose
Fetches AI-sentiment tagged news (up to 50 articles per stock).API Endpoint
Request Payload
Output Files
| File Pattern | Description | Count |
|---|---|---|
market_news/{SYMBOL}_news.json | News per stock | 2,775 files |
Pagination
- Default limit: 50 per stock
- Max tested: 100 (via
page_noiteration)
Threading
- Workers: 15 concurrent threads
- Typical Time: ~4-6 minutes
Script 5: fetch_corporate_actions.py
Purpose
Fetches corporate actions (dividends, bonus, splits) in two modes:- History: Last 2 years
- Upcoming: Next 2 months
API Endpoint
Request Payload
Output Files
| File | Description | Timeframe |
|---|---|---|
upcoming_corporate_actions.json | Future actions | Next 2 months |
history_corporate_actions.json | Past actions | Last 2 years |
Typical Time
~10-15 seconds — Two API calls with count: 5000
Script 6: fetch_surveillance_lists.py
Purpose
Fetches NSE ASM/GSM surveillance lists (stocks under additional surveillance measures).Data Sources
- Primary: Google Sheets Gviz endpoint (NSE-hosted)
- Fallback: Dhan Next.js API
Output Files
| File | Description |
|---|---|
nse_asm_list.json | Additional Surveillance Measure (ASM) stocks |
nse_gsm_list.json | Graded Surveillance Measure (GSM) stocks |
Typical Time
~5-10 seconds — Direct CSV/API fetch
Script 7: fetch_circuit_stocks.py
Purpose
Fetches stocks that hit upper/lower circuit limits today.API Endpoint
Request Payload
Upper Circuit Filter
Output Files
| File | Description |
|---|---|
upper_circuit_stocks.json | Stocks at upper circuit |
lower_circuit_stocks.json | Stocks at lower circuit |
Typical Time
~5 seconds — Two API calls
Script 8: fetch_bulk_block_deals.py
Purpose
Fetches bulk/block deals from the last 30 days.API Endpoint
Request Payload
Output Files
| File | Description |
|---|---|
bulk_block_deals.json | All deals (auto-paginated) |
Pagination
Auto-paginates through all pages (50 records/page).Typical Time
~10-15 seconds — Depends on total deals
Script 9 & 10: Price Bands
fetch_incremental_price_bands.py
Fetches daily price band changes from NSE.incremental_price_bands.json
fetch_complete_price_bands.py
Fetches complete price band list for all securities.complete_price_bands.json
Typical Time
~5 seconds each — Direct CSV downloads
Script 11: fetch_all_indices.py
Purpose
Fetches all 194 market indices (NIFTY 50, BANKNIFTY, sectoral indices).API Endpoint
Output Files
| File | Description | Records |
|---|---|---|
all_indices_list.json | All NSE indices | 194 |
Typical Time
~5 seconds — Single API call
Phase 2.5: OHLCV Data (Optional)
Script 12: fetch_all_ohlcv.py
Purpose
Fetches lifetime daily OHLCV candles for all stocks with smart incremental updates.API Endpoint
Request Payload
Output Files
| File Pattern | Description | Count |
|---|---|---|
ohlcv_data/{SYMBOL}.csv | Daily OHLCV candles | 2,775 files |
Smart Incremental Update
- First run: Fetches full history (~30 minutes)
- Subsequent runs: Only fetches missing days (~2-5 minutes)
CSV Structure
Threading
- Workers: 15 concurrent threads
- Typical Time:
- First run: ~30 minutes
- Incremental: ~2-5 minutes
Script 13: fetch_indices_ohlcv.py
Purpose
Fetches historical OHLCV for all 194 indices (high-speed specialized endpoint).Output Files
| File Pattern | Description |
|---|---|
ohlcv_data/indices/{INDEX}.csv | Index OHLCV candles |
Typical Time
~1-2 minutes — Optimized for indices
Phase 2 Output Summary
Files Produced
Performance Metrics
Total Phase 2 Time (without OHLCV)
~8-12 minutes for all enrichment scripts
Total Phase 2 Time (with OHLCV)
- First run: ~35-40 minutes
- Incremental: ~10-15 minutes
Bottlenecks
- OHLCV fetch: Largest time consumer (~30 min first run)
- News fetch: 15 threads × 2,775 stocks (~5 min)
- Filings fetch: Dual endpoint merge (~4 min)
Dependencies on Phase 1
Error Handling
Phase 2 scripts use soft failure mode:Impact of Failures
- Filings fail: Event markers will miss filing-based triggers
- News fail: News Feed field will be empty
- OHLCV fail: ADR, RVOL, ATH metrics will be 0
- Corporate actions fail: Event markers will miss dividend/bonus/split icons
Next Phase
Once Phase 2 completes, the pipeline proceeds to:Phase 3: Base Analysis
Builds the master
all_stocks_fundamental_analysis.json by merging all Phase 1 and Phase 2 data.